PASSNYC – a not-for-profit organization that facilitates a collective impact– is dedicated to broadening educational opportunities for New York City’s talented and underserved students. New York City is home to some of the most impressive educational institutions in the world, yet in recent years, the City’s specialized high schools - institutions with historically transformative impact on student outcomes - have seen a shift toward more homogeneous student body demographics.
PASSNYC uses public data to identify students within New York City’s under-performing school districts and, through consulting and collaboration with partners, aims to increase the diversity of students taking the Specialized High School Admissions Test (SHSAT). By focusing efforts in under-performing areas that are historically underrepresented in SHSAT registration, we will help pave the path to specialized high schools for a more diverse group of students.
With limited time and resources, PASSNYC must be strategic in systemically improving the diversity pipeline and social mobility of certain disadvantaged groups into the specialized schools. The main question to be answered is where will PASSNYC’s investment produce the greatest ROI for the services they offer (after school programs, test preparation, mentoring, or resources for parents)?
pctELL: percent of English Language Learners
Let’s load some libraries we will use.
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(digest)
library(corrplot)
## corrplot 0.84 loaded
library(ggmap)
library(GGally)
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
source("http://www.sthda.com/upload/rquery_cormat.r")
Let’s load our data from the table:
reg <- read_csv("D5 SHSAT Registrations and Testers.csv")
school <- read_csv("2016 School Explorer.csv")
Before we proceed, let’s take a look at the initial data.
head(reg)
## # A tibble: 6 x 7
## DBN `School name` `Year of SHST` `Grade level` `Enrollment on ~
## <chr> <chr> <int> <int> <int>
## 1 05M0~ P.S. 046 Art~ 2013 8 91
## 2 05M0~ P.S. 046 Art~ 2014 8 95
## 3 05M0~ P.S. 046 Art~ 2015 8 73
## 4 05M0~ P.S. 046 Art~ 2016 8 56
## 5 05M1~ P.S. 123 Mah~ 2013 8 62
## 6 05M1~ P.S. 123 Mah~ 2014 8 62
## # ... with 2 more variables: `Number of students who registered for the
## # SHSAT` <int>, `Number of students who took the SHSAT` <int>
head(school)
## # A tibble: 6 x 161
## `Adjusted Grade` `New?` `Other Location~ `School Name` `SED Code`
## <chr> <chr> <chr> <chr> <dbl>
## 1 <NA> <NA> <NA> P.S. 015 ROB~ 3.10e11
## 2 <NA> <NA> <NA> P.S. 019 ASH~ 3.10e11
## 3 <NA> <NA> <NA> P.S. 020 ANN~ 3.10e11
## 4 <NA> <NA> <NA> P.S. 034 FRA~ 3.10e11
## 5 <NA> <NA> <NA> THE STAR ACA~ 3.10e11
## 6 <NA> <NA> <NA> P.S. 064 ROB~ 3.10e11
## # ... with 156 more variables: `Location Code` <chr>, District <int>,
## # Latitude <dbl>, Longitude <dbl>, `Address (Full)` <chr>, City <chr>,
## # Zip <int>, Grades <chr>, `Grade Low` <chr>, `Grade High` <chr>,
## # `Community School?` <chr>, `Economic Need Index` <chr>, `School Income
## # Estimate` <chr>, `Percent ELL` <chr>, `Percent Asian` <chr>, `Percent
## # Black` <chr>, `Percent Hispanic` <chr>, `Percent Black /
## # Hispanic` <chr>, `Percent White` <chr>, `Student Attendance
## # Rate` <chr>, `Percent of Students Chronically Absent` <chr>, `Rigorous
## # Instruction %` <chr>, `Rigorous Instruction Rating` <chr>,
## # `Collaborative Teachers %` <chr>, `Collaborative Teachers
## # Rating` <chr>, `Supportive Environment %` <chr>, `Supportive
## # Environment Rating` <chr>, `Effective School Leadership %` <chr>,
## # `Effective School Leadership Rating` <chr>, `Strong Family-Community
## # Ties %` <chr>, `Strong Family-Community Ties Rating` <chr>, `Trust
## # %` <chr>, `Trust Rating` <chr>, `Student Achievement Rating` <chr>,
## # `Average ELA Proficiency` <chr>, `Average Math Proficiency` <chr>,
## # `Grade 3 ELA - All Students Tested` <int>, `Grade 3 ELA 4s - All
## # Students` <int>, `Grade 3 ELA 4s - American Indian or Alaska
## # Native` <int>, `Grade 3 ELA 4s - Black or African American` <int>,
## # `Grade 3 ELA 4s - Hispanic or Latino` <int>, `Grade 3 ELA 4s - Asian
## # or Pacific Islander` <int>, `Grade 3 ELA 4s - White` <int>, `Grade 3
## # ELA 4s - Multiracial` <int>, `Grade 3 ELA 4s - Limited English
## # Proficient` <int>, `Grade 3 ELA 4s - Economically
## # Disadvantaged` <int>, `Grade 3 Math - All Students tested` <int>,
## # `Grade 3 Math 4s - All Students` <int>, `Grade 3 Math 4s - American
## # Indian or Alaska Native` <int>, `Grade 3 Math 4s - Black or African
## # American` <int>, `Grade 3 Math 4s - Hispanic or Latino` <int>, `Grade
## # 3 Math 4s - Asian or Pacific Islander` <int>, `Grade 3 Math 4s -
## # White` <int>, `Grade 3 Math 4s - Multiracial` <int>, `Grade 3 Math 4s
## # - Limited English Proficient` <int>, `Grade 3 Math 4s - Economically
## # Disadvantaged` <int>, `Grade 4 ELA - All Students Tested` <int>,
## # `Grade 4 ELA 4s - All Students` <int>, `Grade 4 ELA 4s - American
## # Indian or Alaska Native` <int>, `Grade 4 ELA 4s - Black or African
## # American` <int>, `Grade 4 ELA 4s - Hispanic or Latino` <int>, `Grade 4
## # ELA 4s - Asian or Pacific Islander` <int>, `Grade 4 ELA 4s -
## # White` <int>, `Grade 4 ELA 4s - Multiracial` <int>, `Grade 4 ELA 4s -
## # Limited English Proficient` <int>, `Grade 4 ELA 4s - Economically
## # Disadvantaged` <int>, `Grade 4 Math - All Students Tested` <int>,
## # `Grade 4 Math 4s - All Students` <int>, `Grade 4 Math 4s - American
## # Indian or Alaska Native` <int>, `Grade 4 Math 4s - Black or African
## # American` <int>, `Grade 4 Math 4s - Hispanic or Latino` <int>, `Grade
## # 4 Math 4s - Asian or Pacific Islander` <int>, `Grade 4 Math 4s -
## # White` <int>, `Grade 4 Math 4s - Multiracial` <int>, `Grade 4 Math 4s
## # - Limited English Proficient` <int>, `Grade 4 Math 4s - Economically
## # Disadvantaged` <int>, `Grade 5 ELA - All Students Tested` <int>,
## # `Grade 5 ELA 4s - All Students` <int>, `Grade 5 ELA 4s - American
## # Indian or Alaska Native` <int>, `Grade 5 ELA 4s - Black or African
## # American` <int>, `Grade 5 ELA 4s - Hispanic or Latino` <int>, `Grade 5
## # ELA 4s - Asian or Pacific Islander` <int>, `Grade 5 ELA 4s -
## # White` <int>, `Grade 5 ELA 4s - Multiracial` <int>, `Grade 5 ELA 4s -
## # Limited English Proficient` <int>, `Grade 5 ELA 4s - Economically
## # Disadvantaged` <int>, `Grade 5 Math - All Students Tested` <int>,
## # `Grade 5 Math 4s - All Students` <int>, `Grade 5 Math 4s - American
## # Indian or Alaska Native` <int>, `Grade 5 Math 4s - Black or African
## # American` <int>, `Grade 5 Math 4s - Hispanic or Latino` <int>, `Grade
## # 5 Math 4s - Asian or Pacific Islander` <int>, `Grade 5 Math 4s -
## # White` <int>, `Grade 5 Math 4s - Multiracial` <int>, `Grade 5 Math 4s
## # - Limited English Proficient` <int>, `Grade 5 Math 4s - Economically
## # Disadvantaged` <int>, `Grade 6 ELA - All Students Tested` <int>,
## # `Grade 6 ELA 4s - All Students` <int>, `Grade 6 ELA 4s - American
## # Indian or Alaska Native` <int>, `Grade 6 ELA 4s - Black or African
## # American` <int>, ...
Our data looks pretty clean by most standards, but there is work to be done for sure. For example, we’ll need to rename some of our variable names, join our two tables, etc. Let’s move forward with some cleaning.
In order to make our data analysis easier, let’s start cleaning the data. Let’s begin by focusing on reg.
We begin by renaming the columns we plan to use.
Before continuing, there are three things we need to consider when analyzing our data:
We consider a school to be economically stratified if its economic need as measured by the Economic Need Index1 is more than 10 percentage points from the citywide average. A school can be stratified in either direction - by serving more low-income or more high-income children. New York City reports that 70.6% of schools are economically stratified today.
## # A tibble: 6 x 67
## school DBN district lat long address City zip commSchool eni
## <chr> <chr> <int> <dbl> <dbl> <chr> <chr> <int> <chr> <dbl>
## 1 P.S. ~ 01M0~ 1 40.7 -74.0 333 E ~ NEW ~ 10009 Yes 0.919
## 2 P.S. ~ 01M0~ 1 40.7 -74.0 185 1S~ NEW ~ 10003 No 0.641
## 3 P.S. ~ 01M0~ 1 40.7 -74.0 166 ES~ NEW ~ 10002 No 0.744
## 4 P.S. ~ 01M0~ 1 40.7 -74.0 730 E ~ NEW ~ 10009 No 0.86
## 5 THE S~ 01M0~ 1 40.7 -74.0 121 E ~ NEW ~ 10009 No 0.73
## 6 P.S. ~ 01M0~ 1 40.7 -74.0 600 E ~ NEW ~ 10009 No 0.858
## # ... with 57 more variables: income <dbl>, pctELL <dbl>, pctAttend <dbl>,
## # pctAbsentChronic <dbl>, pctRigor <dbl>, ratingRigor <chr>,
## # pctCollab <dbl>, ratingCollab <chr>, pctSupp <dbl>, ratingSupp <chr>,
## # pctLeader <dbl>, ratingLeader <chr>, pctCommunity <dbl>,
## # ratingCommunity <chr>, pctTrust <dbl>, ratingTrust <chr>, `Student
## # Achievement Rating` <chr>, avgELA <dbl>, avgMath <dbl>, elaAll <int>,
## # elaAll4 <int>, elaBlack <int>, elaHispanic <int>, elaAsian <int>,
## # elaWhite <int>, mathAll <int>, mathAll4 <int>, mathBlack <int>,
## # mathHispanic <int>, mathAsian <int>, mathWhite <int>, enroll <dbl>,
## # registered <dbl>, took <dbl>, regPct <dbl>, tookPct <dbl>,
## # yield <dbl>, academicScore <dbl>, quantRigor <dbl>, quantCollab <dbl>,
## # quantSupp <dbl>, quantLeader <dbl>, quantCommunity <dbl>,
## # quantTrust <dbl>, pctELA4 <dbl>, pctELABlack <dbl>,
## # pctELAHispanic <dbl>, pctELAAsian <dbl>, pctELAWhite <dbl>,
## # pctMath4 <dbl>, pctMathBlack <dbl>, pctMathHispanic <dbl>,
## # pctMathAsian <dbl>, pctMathWhite <dbl>, URM4 <int>, race <fct>,
## # percent <dbl>
Most NY schools have a low White and Asian population, but a high URM population. Now where are they located?
From our Districts, we can see that most of our students come from Districts 9, 10, 31, 2, and 27.
From the looks of our graphs, schools are racially stratified within NYC. Asian and White students are relatively spread out, but it is clear that they are in more affluent areas of the city (e.g. Manhattan, Staten Island). Interestingly, URMs are racially stratified in where they attend school, with Hispanics mostly in the northern parts of the Bronx, Brooklyn and Queens while Black students are in the heart of Brooklyn and some in Harlem. Now, when we combine our URMs, we can see three clear clusters of where they primarily attend school: the Bronx, Brooklyn-Queens, and a new cluster we did not see before, West New Brighton.
From this exploratory plot, we’re seeing that race and income interact with each other in the form of stratification, so let’s take further look at the ENI of NYC:
As suspected, our schools with a high URM have a high ENI. Let’s further explore:
##
## Call:
## lm(formula = eni ~ pctBlackHispanic, data = school_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45278 -0.08124 0.01480 0.08578 0.53042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.26611 0.01011 26.32 <2e-16 ***
## pctBlackHispanic 0.55578 0.01283 43.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1333 on 1246 degrees of freedom
## (25 observations deleted due to missingness)
## Multiple R-squared: 0.601, Adjusted R-squared: 0.6007
## F-statistic: 1877 on 1 and 1246 DF, p-value: < 2.2e-16
As we can see, there is a strong correlation between URM and ENI, and White students and ENI. Whereas the former is positively correlated, the latter is negatively correlated. This, in addition to the .6 correlation beweetn URM and ENI, tells us that we can predict which schools will have a higher economic need if they have a higher URM population.
Now let’s take a look at academic performance as it is the main thing we’re looking to improve for NYCPASS:
##
## Call:
## lm(formula = academicScore ~ eni + pctBlackHispanic, data = school_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25036 -0.30770 -0.06941 0.22148 2.65450
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.21051 0.04629 155.78 <2e-16 ***
## eni -1.78673 0.10400 -17.18 <2e-16 ***
## pctBlackHispanic -1.10934 0.07460 -14.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4846 on 1215 degrees of freedom
## (55 observations deleted due to missingness)
## Multiple R-squared: 0.6523, Adjusted R-squared: 0.6517
## F-statistic: 1140 on 2 and 1215 DF, p-value: < 2.2e-16
Economic need is a strong predictor of how well a school will do and it makes sense– the higher the need students have, the lower the scores will be due to resource insecurity. As we’ve found out from before, ENI and race have a relatively strong correlation, but there must be schools that despite their high ENI, produce a high number of 4s.
## school DBN district lat
## Length:1273 Length:1273 Min. : 1.00 Min. :40.51
## Class :character Class :character 1st Qu.: 9.00 1st Qu.:40.67
## Mode :character Mode :character Median :15.00 Median :40.72
## Mean :16.13 Mean :40.73
## 3rd Qu.:24.00 3rd Qu.:40.82
## Max. :32.00 Max. :40.90
##
## long address City zip
## Min. :-74.24 Length:1273 Length:1273 Min. :10001
## 1st Qu.:-73.96 Class :character Class :character 1st Qu.:10452
## Median :-73.92 Mode :character Mode :character Median :11203
## Mean :-73.92 Mean :10815
## 3rd Qu.:-73.88 3rd Qu.:11232
## Max. :-73.71 Max. :11694
##
## commSchool eni income pctELL
## Length:1273 Min. :0.0490 Min. : 16902 Min. :0.0000
## Class :character 1st Qu.:0.5500 1st Qu.: 33610 1st Qu.:0.0400
## Mode :character Median :0.7310 Median : 43151 Median :0.0900
## Mean :0.6724 Mean : 48443 Mean :0.1248
## 3rd Qu.:0.8410 3rd Qu.: 58518 3rd Qu.:0.1700
## Max. :0.9570 Max. :181382 Max. :0.9900
## NA's :25 NA's :397
## pctAsian pctBlack pctHispanic pctBlackHispanic
## Min. :0.0000 Min. :0.0000 Min. :0.0200 Min. :0.0300
## 1st Qu.:0.0100 1st Qu.:0.0600 1st Qu.:0.1800 1st Qu.:0.4900
## Median :0.0400 Median :0.2400 Median :0.3600 Median :0.9000
## Mean :0.1164 Mean :0.3202 Mean :0.4115 Mean :0.7316
## 3rd Qu.:0.1400 3rd Qu.:0.5600 3rd Qu.:0.6400 3rd Qu.:0.9600
## Max. :0.9500 Max. :0.9700 Max. :1.0000 Max. :1.0000
##
## pctWhite pctAttend pctAbsentChronic pctRigor
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0100 1st Qu.:0.9200 1st Qu.:0.1100 1st Qu.:0.8600
## Median :0.0300 Median :0.9400 Median :0.2000 Median :0.9000
## Mean :0.1315 Mean :0.9272 Mean :0.2159 Mean :0.8948
## 3rd Qu.:0.1600 3rd Qu.:0.9500 3rd Qu.:0.3000 3rd Qu.:0.9400
## Max. :0.9200 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :25 NA's :25 NA's :25
## ratingRigor pctCollab ratingCollab pctSupp
## Length:1273 Min. :0.0000 Length:1273 Min. :0.0000
## Class :character 1st Qu.:0.8500 Class :character 1st Qu.:0.8400
## Mode :character Median :0.9000 Mode :character Median :0.8900
## Mean :0.8843 Mean :0.8875
## 3rd Qu.:0.9400 3rd Qu.:0.9400
## Max. :1.0000 Max. :1.0000
## NA's :25 NA's :25
## ratingSupp pctLeader ratingLeader pctCommunity
## Length:1273 Min. :0.0000 Length:1273 Min. :0.0000
## Class :character 1st Qu.:0.7600 Class :character 1st Qu.:0.8000
## Mode :character Median :0.8300 Mode :character Median :0.8300
## Mean :0.8161 Mean :0.8309
## 3rd Qu.:0.8900 3rd Qu.:0.8700
## Max. :0.9900 Max. :0.9900
## NA's :25 NA's :25
## ratingCommunity pctTrust ratingTrust
## Length:1273 Min. :0.0000 Length:1273
## Class :character 1st Qu.:0.8700 Class :character
## Mode :character Median :0.9200 Mode :character
## Mean :0.9042
## 3rd Qu.:0.9400
## Max. :1.0000
## NA's :25
## Student Achievement Rating avgELA avgMath
## Length:1273 Min. :1.810 Min. :1.830
## Class :character 1st Qu.:2.250 1st Qu.:2.300
## Mode :character Median :2.450 Median :2.580
## Mean :2.534 Mean :2.668
## 3rd Qu.:2.760 3rd Qu.:2.980
## Max. :3.930 Max. :4.200
## NA's :55 NA's :55
## elaAll elaAll4 elaBlack elaHispanic
## Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.00 Median : 0.000 Median : 0.000 Median : 0.000
## Mean : 52.15 Mean : 7.317 Mean : 0.916 Mean : 1.519
## 3rd Qu.: 74.00 3rd Qu.: 4.000 3rd Qu.: 0.000 3rd Qu.: 1.000
## Max. :743.00 Max. :261.000 Max. :59.000 Max. :62.000
##
## elaAsian elaWhite mathAll mathAll4
## Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.000 Median : 0.00 Median : 0.00 Median : 0.000
## Mean : 2.254 Mean : 1.93 Mean : 43.84 Mean : 4.908
## 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 59.00 3rd Qu.: 1.000
## Max. :203.000 Max. :116.00 Max. :652.00 Max. :312.000
##
## mathBlack mathHispanic mathAsian mathWhite
## Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.0000 Median : 0.000 Median : 0.0000
## Mean : 0.6096 Mean : 0.9466 Mean : 1.983 Mean : 0.9701
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :107.0000 Max. :71.0000 Max. :246.000 Max. :126.0000
##
## enroll registered took regPct
## Min. : 55.00 Min. : 4.00 Min. : 3.429 Min. :0.0005
## 1st Qu.: 64.75 1st Qu.:12.38 1st Qu.: 8.237 1st Qu.:0.0016
## Median : 79.66 Median :22.62 Median :10.417 Median :0.0029
## Mean : 88.15 Mean :27.33 Mean :12.983 Mean :0.0034
## 3rd Qu.: 98.88 3rd Qu.:33.41 3rd Qu.:16.188 3rd Qu.:0.0042
## Max. :205.00 Max. :90.00 Max. :32.000 Max. :0.0100
## NA's :1251 NA's :1251 NA's :1251 NA's :1251
## tookPct yield academicScore quantRigor
## Min. :0.0004 Min. :0.2196 Min. :3.790 Min. :0.000
## 1st Qu.:0.0011 1st Qu.:0.4194 1st Qu.:4.550 1st Qu.:2.000
## Median :0.0015 Median :0.5499 Median :5.030 Median :3.000
## Mean :0.0016 Mean :0.5697 Mean :5.202 Mean :2.808
## 3rd Qu.:0.0019 3rd Qu.:0.7162 3rd Qu.:5.720 3rd Qu.:4.000
## Max. :0.0046 Max. :0.9412 Max. :8.080 Max. :4.000
## NA's :1251 NA's :1251 NA's :55
## quantCollab quantSupp quantLeader quantCommunity
## Min. :0.000 Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.00 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.00 Median :3.000
## Mean :2.989 Mean :2.853 Mean :2.72 Mean :2.535
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.00 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.00 Max. :4.000
##
## quantTrust pctELA4 pctELABlack pctELAHispanic
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:3.000 1st Qu.:0.0303 1st Qu.:0.0000 1st Qu.:0.0000
## Median :3.000 Median :0.0697 Median :0.0000 Median :0.2000
## Mean :2.901 Mean :0.1167 Mean :0.2395 Mean :0.3249
## 3rd Qu.:4.000 3rd Qu.:0.1528 3rd Qu.:0.4286 3rd Qu.:0.5714
## Max. :4.000 Max. :0.8571 Max. :1.0000 Max. :1.0000
## NA's :712 NA's :768 NA's :768
## pctELAAsian pctELAWhite pctMath4 pctMathBlack
## Min. :0.0000 Min. :0.0000 Min. :1 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :1 Median :0.0000
## Mean :0.1178 Mean :0.1158 Mean :1 Mean :0.2023
## 3rd Qu.:0.1600 3rd Qu.:0.1364 3rd Qu.:1 3rd Qu.:0.2614
## Max. :0.8750 Max. :1.0000 Max. :1 Max. :1.0000
## NA's :768 NA's :768 NA's :902 NA's :902
## pctMathHispanic pctMathAsian pctMathWhite URM4
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.1139 Median :0.000 Median :0.0000 Median : 0.000
## Mean :0.2987 Mean :0.168 Mean :0.1052 Mean : 3.991
## 3rd Qu.:0.5000 3rd Qu.:0.250 3rd Qu.:0.0033 3rd Qu.: 3.000
## Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :212.000
## NA's :902 NA's :902 NA's :902
One way of understanding performance is examining the number of URM students receiving 4s for their exams, but not just any schools, why not target the highest schools? We target schools who fall 1 standard deviation above the mean (top 2.5% of schools) to produce a list of schools we can specifically target with NYCPASS’s programs to further bolster their numbers:
school_2.5 <- school_2.5 %>%
arrange(desc(count))
school_2.5
## # A tibble: 62 x 2
## school count
## <chr> <int>
## 1 SUCCESS ACADEMY CHARTER SCHOOL - HARLEM 1 212
## 2 I.S. 145 JOSEPH PULITZER 133
## 3 I.S. 73 - THE FRANK SANSIVIERI INTERMEDIATE SCHOOL 93
## 4 M.S. 180 DR. DANIEL HALE WILLIAMS 57
## 5 I.S. 227 LOUIS ARMSTRONG 55
## 6 J.H.S. 383 PHILIPPA SCHUYLER 55
## 7 P.S. 189 THE BILINGUAL CENTER 51
## 8 I.S. 230 50
## 9 SCHOLARS' ACADEMY 50
## 10 ACHIEVEMENT FIRST BUSHWICK CHARTER SCHOOL 49
## # ... with 52 more rows
### Recommendation 1: Geolocation Strategy Based on the information above, we can see there is a clear corridor of high scores in high need, URM schools. PASSNYC can concentrate marketing and outreach efforts to this area to begin raising awareness for their solutions so to gain the buy-in from parents, teachers, and administrators.
## # A tibble: 29 x 70
## school DBN district lat long address City zip commSchool eni
## <chr> <chr> <int> <dbl> <dbl> <chr> <chr> <int> <chr> <dbl>
## 1 THE M~ 06M2~ 6 40.8 -74.0 71-111~ NEW ~ 10027 No 0.724
## 2 P.S. ~ 08X3~ 8 40.8 -73.8 2750 L~ BRONX 10465 No 0.388
## 3 P.S. ~ 15K1~ 15 40.7 -74.0 825 4T~ BROO~ 11232 No 0.715
## 4 ROBER~ 24Q5~ 24 40.7 -73.9 47-07 ~ LONG~ 11101 No 0.513
## 5 P.S. ~ 25Q1~ 25 40.8 -73.8 128-02~ COLL~ 11356 No 0.454
## 6 P.S. ~ 27Q2~ 27 40.7 -73.8 84-40 ~ RICH~ 11418 No 0.603
## 7 ALL C~ 32K5~ 32 40.7 -73.9 321 PA~ BROO~ 11237 No 0.667
## 8 ACHIE~ 84K5~ 32 40.7 -73.9 1300 G~ BROO~ 11237 No 0.718
## 9 CENTR~ 84Q0~ 24 40.7 -73.9 55-30 ~ ELMH~ 11373 No 0.688
## 10 SOUTH~ 84X3~ 12 40.8 -73.9 977 FO~ BRONX 10459 No 0.756
## # ... with 19 more rows, and 60 more variables: income <dbl>,
## # pctELL <dbl>, pctAsian <dbl>, pctBlack <dbl>, pctHispanic <dbl>,
## # pctBlackHispanic <dbl>, pctWhite <dbl>, pctAttend <dbl>,
## # pctAbsentChronic <dbl>, pctRigor <dbl>, ratingRigor <chr>,
## # pctCollab <dbl>, ratingCollab <chr>, pctSupp <dbl>, ratingSupp <chr>,
## # pctLeader <dbl>, ratingLeader <chr>, pctCommunity <dbl>,
## # ratingCommunity <chr>, pctTrust <dbl>, ratingTrust <chr>, `Student
## # Achievement Rating` <chr>, avgELA <dbl>, avgMath <dbl>, elaAll <int>,
## # elaAll4 <int>, elaBlack <int>, elaHispanic <int>, elaAsian <int>,
## # elaWhite <int>, mathAll <int>, mathAll4 <int>, mathBlack <int>,
## # mathHispanic <int>, mathAsian <int>, mathWhite <int>, enroll <dbl>,
## # registered <dbl>, took <dbl>, regPct <dbl>, tookPct <dbl>,
## # yield <dbl>, academicScore <dbl>, quantRigor <dbl>, quantCollab <dbl>,
## # quantSupp <dbl>, quantLeader <dbl>, quantCommunity <dbl>,
## # quantTrust <dbl>, pctELA4 <dbl>, pctELABlack <dbl>,
## # pctELAHispanic <dbl>, pctELAAsian <dbl>, pctELAWhite <dbl>,
## # pctMath4 <dbl>, pctMathBlack <dbl>, pctMathHispanic <dbl>,
## # pctMathAsian <dbl>, pctMathWhite <dbl>, URM4 <int>
When examining the hi-po population, we can see that Districts 4,5,7,8,9,11,23,24, and 27 have hi-po students.
numeric<- na.omit(school_clean)
numeric%>%
select(eni, income, yield, enroll, regPct, tookPct, starts_with('pct'), starts_with('avg'), starts_with('quant'))%>%
cor() %>%
corrplot(type = "upper", method = "square")
## Warning in cor(.): the standard deviation is zero